Water Resources in Copenhagen during 20th and 21st century
This script visualizes the access to water resources across Copenhagen suburbs over time as a digital component for a course started in the Spring 2022, called City: Between Culture and Nature, taught by Mikkel Thelle and Mikkel Høghøj. The course surveys the gradual appearance of private and public bathing facilities, toilets and communal hygienic resources in the city of Copenhagen during the 20th century. By editing elements in this script, you can plot and explore different aspects of past and present hygienic amenities across the suburbs in the capital of Denmark.
As you are plotting you should be trying to answer the question of
Which Copenhagen suburbs are the most/least well-off in terms of hygiene, and why?
Before we start: data wrangling
First load the packages necessary for spatial data visualisation and analysis.
Spatial data
Next, load your spatial data - polygons representing the suburbs of
Copenhagen. On some computers, you may need to use the
options = "ENCODING=WINDOWS-1252" argument to work around
the special characters in this shapefile. Also, if the shapefile is
producing invalid shape errors, st_make_valid() function
will fix residual encoding issues in the geometry column.
suburbs <- st_read("../data/bydel.shp", options = "ENCODING=WINDOWS-1252") # thanks to Malte for fixing the Danish special characters recognition on my computeroptions: ENCODING=WINDOWS-1252
Reading layer `bydel' from data source
`C:\Users\Adela\Documents\RStudio\1_Teaching\cds-spatial\data\bydel.shp'
using driver `ESRI Shapefile'
Simple feature collection with 10 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 12.45305 ymin: 55.61284 xmax: 12.73425 ymax: 55.73271
Geodetic CRS: WGS 84
Simple feature collection with 6 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 12.46344 ymin: 55.63174 xmax: 12.73425 ymax: 55.73271
Geodetic CRS: WGS 84
id bydel_nr navn areal_m2
5 6 6 Vanløse 6699011
6 5 5 Valby 9234177
7 4 4 Vesterbro-Kongens Enghave 8472602
8 1 1 Indre By 10488466
9 9 9 Amager Øst 9820410
10 2 2 Østerbro 9858740
geometry
5 MULTIPOLYGON (((12.4982 55....
6 MULTIPOLYGON (((12.52434 55...
7 MULTIPOLYGON (((12.54553 55...
8 MULTIPOLYGON (((12.72897 55...
9 MULTIPOLYGON (((12.63082 55...
10 MULTIPOLYGON (((12.59777 55...
[1] 3 7 8 10 6 5 4 1 9 2
Attribute data
Next let’s bring in the attribute data. You can read the data in from
a googlesheet with read_sheet() function from the
googlesheets4 package, or you can use the
read_csv() function to read the wc.csv
provided in the data folder.
# Uncomment the lines below to read data from GDrive wc <-
# read_sheet('https://docs.google.com/spreadsheets/d/1iFvycp6M6bF8GBkGjA2Yde2yCIhiy5_slAkGF-RUF7w/edit#gid=0',
# col_types = 'cnnnnnnnn') write_csv(wc, '../data/wc.csv')
wc <- read_csv("../data/wc.csv")
# Check out the data and try to grasp what is being summarized in the columns
wc# A tibble: 100 × 9
Suburb suburb_id flats wc_access bath year bath_communal_ct wc_communal_ct
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Indre … 1 16280 11310 3800 1950 NA NA
2 Christ… 1 5490 3900 900 1950 NA NA
3 Voldkv… 1 13460 12690 4560 1950 NA NA
4 Østerb… 2 30820 28900 13750 1950 NA NA
5 Indre … 3 28700 24380 5910 1950 NA NA
6 Ydre N… 3 21710 20410 5800 1950 NA NA
7 Vester… 4 25850 23930 3730 1950 NA NA
8 Kongen… 4 6270 6240 4240 1950 NA NA
9 Valby 5 14430 13970 8190 1950 NA NA
10 Vigers… 5 7700 7580 5050 1950 NA NA
# ℹ 90 more rows
# ℹ 1 more variable: hot_water <dbl>
Can you quess why the suburb_ids are repeated across multiple suburbs?
Spatial resolution adjustment - data aggregation
Data on access to hygienic facilities and other water resources in
Copenhagen now looks good and tidy, but its spatial resolution
is higher than the provided polygons (as in we have multiple rows that
all fit within one suburb id). We therefore use the
group_by() function to aggregate the data by id before we
continue with any spatial operations. Given that the dataset is in fact
a time-series, and each kvarter has a record for a given
year or decade, we need to group first by the year and then
only by id.
While aggregating the finer scale data into larger units, it is
convenient to generate some statistics, such as percentages of flats
that have bath and wc and hot water access within each suburb. We do
this using the summarize() function below.
wcdata <- wc %>%
group_by(year, suburb_id) %>%
summarize(flats = sum(flats), bath = sum(bath), pct_bath = round(bath/flats *
100, 2), wc_access = sum(wc_access), pct_wc = round(wc_access/flats * 100,
2), warmH20 = sum(hot_water), pct_wH20 = round(warmH20/flats * 100, 2), communal_wc = sum(wc_communal_ct),
communal_bath = sum(bath_communal_ct))
wcdata# A tibble: 50 × 11
# Groups: year [5]
year suburb_id flats bath pct_bath wc_access pct_wc warmH20 pct_wH20
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1950 1 35230 9260 26.3 27900 79.2 8530 24.2
2 1950 2 30820 13750 44.6 28900 93.8 10750 34.9
3 1950 3 50410 11710 23.2 44790 88.8 8810 17.5
4 1950 4 32120 7970 24.8 30170 93.9 6110 19.0
5 1950 5 22130 13240 59.8 21550 97.4 10830 48.9
6 1950 6 10260 6780 66.1 10120 98.6 6270 61.1
7 1950 7 27260 14790 54.3 26770 98.2 13280 48.7
8 1950 8 19270 15000 77.8 18980 98.5 14690 76.2
9 1950 9 23960 12470 52.0 22950 95.8 11210 46.8
10 1950 10 18000 9030 50.2 16200 90 7800 43.3
# ℹ 40 more rows
# ℹ 2 more variables: communal_wc <dbl>, communal_bath <dbl>
Questions:
What percentage of flats in Copenhagen have access to a bath in 1950?
In what year is the percentage of flats with access to a WC in Copenhagen the lowest, and how much it is?
# A tibble: 5 × 4
year flatsC bathC pct_bath
<dbl> <dbl> <dbl> <dbl>
1 1950 269460 114000 42.3
2 1955 272480 124460 45.7
3 1965 281510 151350 53.8
4 1970 285214 151974 53.3
5 1981 272334 165960 60.9
# A tibble: 5 × 4
year flatsC wcC pct_wcC
<dbl> <dbl> <dbl> <dbl>
1 1950 269460 248330 92.2
2 1955 272480 270000 99.1
3 1965 281510 279970 99.5
4 1970 285214 253434 88.9
5 1981 272334 254358 93.4
Join the aggregated attribute data to their spatial representations
Now we can join the data on water resources with the spatial polygons for suburbs
wc_spatial <- suburbs %>%
merge(wcdata, by.x = "id", by.y = "suburb_id") %>%
st_make_valid()
wc_spatialSimple feature collection with 50 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 12.45305 ymin: 55.61284 xmax: 12.73425 ymax: 55.73271
Geodetic CRS: WGS 84
First 10 features:
id bydel_nr navn areal_m2 year flats bath pct_bath wc_access pct_wc
1 1 1 Indre By 10488466 1981 26413 14035 53.14 22546 85.36
2 1 1 Indre By 10488466 1950 35230 9260 26.28 27900 79.19
3 1 1 Indre By 10488466 1965 32470 12780 39.36 32450 99.94
4 1 1 Indre By 10488466 1970 30440 11386 37.40 22381 73.52
5 1 1 Indre By 10488466 1955 33490 9740 29.08 33430 99.82
warmH20 pct_wH20 communal_wc communal_bath geometry
1 NA NA NA NA MULTIPOLYGON (((12.72895 55...
2 8530 24.21 NA NA MULTIPOLYGON (((12.72895 55...
3 NA NA 9510 2280 MULTIPOLYGON (((12.72895 55...
4 NA NA NA NA MULTIPOLYGON (((12.72895 55...
5 9130 27.26 NA NA MULTIPOLYGON (((12.72895 55...
[ reached 'max' / getOption("max.print") -- omitted 5 rows ]
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE
Now that we have a merged spatial dataset with attributes, let’s review what attributes are available for visualisation.
[1] "id" "bydel_nr" "navn" "areal_m2"
[5] "year" "flats" "bath" "pct_bath"
[9] "wc_access" "pct_wc" "warmH20" "pct_wH20"
[13] "communal_wc" "communal_bath" "geometry"
There is the suburb polygon data, such as id,
bydel_nr, navn and areal_m2, and
there is also the attribute data such as year,
flats, bath ,etc. This gives us lots of
choices for display. Lets put the data in a map.
Plot the data on the map
Let’s start by plotting one year alone, to learn how the map works.
Flats and water resources in 1950
Run the whole chunk below, and once it renders, look at the map.
Afterwards, try changing the definition of what is to be displayed on
line 116. For example, replace "flats" for some other
column, such as "pct_bath", or "wc_access" to
see how the map changes. To modify the legend, you can modify line 118
where we describe style. Replace
style = "jenks" with "pretty", or
"equal", or "quantile". What happens to your
classification?
# Practice on one single year
wc1950 <- wc_spatial %>%
filter(year == 1950)
# Your tmap might not find the polygons fully valid. You can fix them this way:
library(tmap)
tmap_options(check.and.fix = TRUE)
# Map! # try other attributes in tm_polygons, such as 'pct_bath' here
tmap_mode(mode = "plot")
tm_shape(wc1950) + tm_borders(col = "black", lwd = 1) + tm_polygons("flats", id = "navn",
style = "jenks") + tm_legend(legend.position = c("RIGHT", "TOP")) + tm_compass(position = c("RIGHT",
"BOTTOM"), type = "rose", size = 2) + tm_scale_bar(position = c("RIGHT", "BOTTOM"),
breaks = c(0, 2, 4), text.size = 1) + tm_credits(position = c("RIGHT", "BOTTOM"),
text = "Adela Sobotkova, 2022") + tm_layout(main.title = "Copenhagen 1950 situation",
legend.outside = FALSE)Flats through time
Now, that you have mastered visualization of a single year, let’s plot all the years we have available!
tmap_options(limits = c(facets.view = 5)) # we want to view 5 periods
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 3, nrow = 2) + tm_polygons("flats",
id = "navn", style = "jenks") + tm_layout(main.title = "Copenhagen Flats", legend.outside = TRUE)Lets’ look at flats per square kilometer
Now that we have a spatial object, we can create new columns, for example utilizing the shape area to calculate the density of flats per sq km.
library(tmap)
tmap_options(limits = c(facets.view = 5)) # we want to view 6 years
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 3, nrow = 2) + tm_polygons("flat_per_km",
n = 5, style = "jenks") #+
## Access to toilets and baths, per suburb and square
kilometer
Lets calculate the baths and toilets available per square kilometer per each suburb
library(tmap)
tmap_options(limits = c(facets.view = 5)) # we want to view 5 years
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 5, nrow = 1) + tm_polygons("pct_bath",
id = "navn", style = "pretty", title = "% of flats with <br> access to a bath") #+
library(tmap)
tmap_options(limits = c(facets.view = 5)) # we want to view 5 periods
tmap_mode(mode = "view")
tm_shape(wc_spatial) + tm_facets(by = "year", ncol = 3, nrow = 2) + tm_polygons("pct_wc",
id = "navn", style = "pretty", title = "% of flats with <br>access to WC")
## Total baths per area
Recalculate the number of baths to total per sq kilometer
..or continue with communal resources and warm water (OPTIONAL)
Why not practice and try plotting the flats that have access to communal baths and wc, and or hot water? Create your own map here, following the examples above.
Get additional data for Copenhagen from OpenStreetMap
The OpenStreetMap contains free and open spatial data for physical features on the ground, with each features’ type being define using key:value pair tags. Each tag describes a geographic attribute of the feature being shown by that specific node, way or relation.
Extract OSM data
Use:
osmdata:opq()to define the bounding box of the osm requestosmdata:add_osm_feature()to define the key:value pairs you are looking forosmdata:osmdata_sf()to retrieve the osm data.
library(osmdata)
# Create a bounding box
bb <- suburbs %>%
st_transform(4326) %>%
st_bbox()
plot(bb)# Define the bb of the OSM request
q <- opq(bbox = bb, timeout = 180)
# Various key:value pairs you can explore
qa <- add_osm_feature(q, key = "amenity", value = "public_bath")
# qb <- add_osm_feature(q, key = 'amenity',value = 'drinking_water')
qc <- add_osm_feature(q, key = "amenity", value = "shower")
qd <- add_osm_feature(q, key = "amenity", value = "toilets")
# qe <- add_osm_feature(q, key = 'amenity',value = 'water_point')
# Retrieve the data
# well, first, let's make sure R can get to the internet as OSM can have issues
# here:
# https://stackoverflow.com/questions/69264448/error-overpass-query-unavailable-without-internet-while-connected-to-internet
# https://github.com/ropensci/osmdata/issues/243
assign("has_internet_via_proxy", TRUE, environment(curl::has_internet))
public_bath <- c(osmdata_sf(qa), osmdata_sf(qc), osmdata_sf(qd))Clean up OSM data
Use the following code to clean the results and project them in Danish UTM.
This code:
- removes the duplicated geometries thanks to
osmdata::unique_osmdata(see the documentation for details) - projects into WGS84 UTM32
- keeps the name attribute only
- computes the centroids for the baths stored as polygons
- Eventually, the baths outside our CPH suburbs are removed
library(osmdata)
bath_uniq <- unique_osmdata(public_bath)
rpoint <- bath_uniq$osm_points %>%
filter(!is.na(amenity)) %>%
st_transform(32632) %>%
dplyr::select(name)
rpoly <- bath_uniq$osm_polygons %>%
st_transform(32632) %>%
dplyr::select(name) %>% st_centroid()
baths_osm <- rbind(rpoly,rpoint)
baths_osm <- st_intersection(baths_osm, st_transform(suburbs, 32632) %>% st_geometry() %>% st_union())
# transform also historical baths
baths_cph <- wc_spatial %>%
st_make_valid() %>% # here I am fixing a duplicate vertex
st_centroid() %>%
st_transform(32632) %>%
mutate(radius = sqrt(bath_per_km)) %>%
arrange(desc(bath_per_km))Display two maps side-by-side
Now, let’s display the results in two synchronized
mapview maps:
- one with baths extracted from OSM
- another one with bathing resources in suburbs
- Use the
mapview::syncfunction to display both maps side by side with synchronisation.
library(mapview)
library(leafsync)
map_osm <- mapview(suburbs) + mapview(baths_osm, map.types = "OpenStreetMap", col.regions = "#940000",
label = as.character(suburbs$name), color = "white", legend = FALSE, layer.name = "Baths in OSM",
homebutton = FALSE, lwd = 0.5)
map_cph <- mapview(suburbs) + mapview(baths_cph[, -3], map.types = "OpenStreetMap",
col.regions = "#940000", color = "white", cex = "bath_per_km", legend = TRUE,
layer.name = "Baths per sq km <br>in suburbs from 1970", homebutton = FALSE,
lwd = 0.5)
sync(map_osm, map_cph)What a fantastic synced map! Two maps show entirely different datasets (OSM baths in 2021 versus historical data from suburbs) moving interactively. The synced map functionality is nice, but the comparison does not make much sense: individual datapoints for OSM public bathroom versus private bathing facilities aggregated by suburb polygons are not exactly comparable. The synced map underscores the problems of aggregation, and the uncertainty on where the suburb data represents on the ground.
Question:
- How could we improve the maps and more meaningfully compare the OSM and suburb data?
Improve the display with some comparable dataset
It might be better to combine the OSM data with the public bathhouse data that we had looked at previously in Leaflet.
We need to
- load the data from google spreadsheet
- filter out missing coordinates and convert to sf object
- project to WGS84 UTM 32
# baths <-
# read_sheet('https://docs.google.com/spreadsheets/d/15i17dqdsRYv6tdboZIlxTmhdcaN-JtgySMXIXwb5WfE/edit#gid=0',
# col_types = 'ccnnncnnnc') write_rds(baths,'../data/baths.rds')
baths <- read_rds("../data/baths.rds")
names(baths) [1] "BathhouseName" "Coordinates" "Longitude"
[4] "Latitude" "Quality" "AlternativeCoordinates"
[7] "Long" "Lat" "ErrorRadius_m"
[10] "Notes"
hist_bathhouses <- baths %>%
dplyr::select(BathhouseName, Longitude, Latitude, Quality) %>%
filter(!is.na(Longitude)) %>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326)
hist_baths <- st_transform(hist_bathhouses, crs = 32632)Now, let’s load this projected historical bathouse object in the synced map so we can compare the locations with OSM data.
library(mapview)
map_osm <- mapview(baths_osm, map.types = "OpenStreetMap",
col.regions = "#940000",
label = as.character(suburbs$name),
color = "white", legend = FALSE, layer.name = "Baths in OSM",
homebutton = FALSE, lwd = 0.5)
map_hist <- mapview(hist_baths,
map.types = "OpenStreetMap",
col.regions = "#940000",
color = "white",
# cex = "bath_per_km",
legend = TRUE,
layer.name = "Public bathhouses, early 20th century",
homebutton = FALSE, lwd = 0.5)
library(leafsync)
sync(map_osm,map_hist)
Lovely comparison of apples and apples (more or less). Basically
we have two different point patterns, showing current public baths and
toilets in Copenhagen and historical ones from early 20th century. The
city has grown, hygienic standards have risen, and so clearly have the
hygienic facilities. While you can see some spatial differences and may
postulate the reason for increased density of points, how would you
assess it formally?
In the next section, next wee, you will learn how to formally evaluate
the (dis)similarity of spatial patterning between the historical and
current data.
Comparing two point patterns. How do we best do it?
We have two patterns, historical and OSM data. Are they similar or dissimilar? How do the patterns of historical and current public bathhouses compare beyond a quick eyeball evaluation?
Here we might be able to use some statistical functions that contrast nearest neighbor distances or multi-distance clustering across the two groups.
We will learn how to do that in a couple of weeks!